Search CORE

35 research outputs found

The evolution of domain-content in bacterial genomes

Author: Molina Nacho
van Nimwegen Erik
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

BACKGROUND: Across all sequenced bacterial genomes, the number of domains nc in different functional categories c scales as a power-law in the total number of domains n, i.e. nc proportional n(alpha)c, with exponents alpha(c) that vary across functional categories. Here we investigate the implications of these scaling laws for the evolution of domain-content in bacterial genomes and derive the simplest evolutionary model consistent with these scaling laws. RESULTS: We show that, using only an assumption of time invariance, the scaling laws uniquely determine the relative rates of domain additions and deletions across all functional categories and evolutionary lineages. In particular, the model predicts that the rate of additions and deletions of domains of category c is proportional to the number of domains nc currently in the genome and we discuss the implications of this observation for the role of horizontal transfer in genome evolution. Second, in addition to being proportional to nc, the rate of additions and deletions of domains of category c is proportional to a category-dependent constant rho(c), which is the same for all evolutionary lineages. This 'evolutionary potential' rho(c) represents the relative probability for additions/deletions of domains of category c to be fixed in the population by selection and is predicted to equal the scaling exponent alpha(c). By comparing the domain content of 93 pairs of closely-related genomes from all over the phylogenetic tree of bacteria, we demonstrate that the model's predictions are supported by available genome-sequence data. CONCLUSION: Our results establish a direct quantitative connection between the scaling of domain numbers with genome size, and the rate of addition and deletions of domains during short evolutionary time intervals.of domain numbers with genome size, and the rate of addition and deletions of domains during short evolutionary time intervals

CiteSeerX

Crossref

Springer - Publisher Connector

edoc

Directory of Open Access Journals

PubMed Central

SwissRegulon: a database of genome-wide annotations of regulatory sites

Author: Erb Ionas
Molina Nacho
Pachkov Mikhail
van Nimwegen Erik
Publication venue: Oxford University Press
Publication date: 27/11/2006
Field of study

SwissRegulon () is a database containing genome-wide annotations of regulatory sites in the intergenic regions of genomes. The regulatory site annotations are produced using a number of recently developed algorithms that operate on multiple alignments of orthologous intergenic regions from related genomes in combination with, whenever available, known sites from the literature, and ChIP-on-chip binding data. Currently SwissRegulon contains annotations for yeast and 17 prokaryotic genomes. The database provides information about the sequence, location, orientation, posterior probability and, whenever available, binding factor of each annotated site. To enable easy viewing of the regulatory site annotations in the context of other features annotated on the genomes, the sites are displayed using the GBrowse genome browser interface and can be queried based on any annotated genomic feature. The database can also be queried for regulons, i.e. sites bound by a common factor

Edinburgh Research Explorer

MotEvo: integrated Bayesian probabilistic methods for inferring regulatory sites and motifs on multiple alignments of DNA sequences

Author: Arnold Phil
Erb Ionas
Molina Nacho
Pachkov Mikhail
van Nimwegen Erik
Publication venue
Publication date: 02/08/2017
Field of study

Motivation: Probabilistic approaches for inferring transcription factor binding sites (TFBSs) and regulatory motifs from DNA sequences have been developed for over two decades. Previous work has shown that prediction accuracy can be significantly improved by incorporating features such as the competition of multiple transcription factors (TFs) for binding to nearby sites, the tendency of TFBSs for co-regulated TFs to cluster and form cis-regulatory modules and explicit evolutionary modeling of conservation of TFBSs across orthologous sequences. However, currently available tools only incorporate some of these features, and significant methodological hurdles hampered their synthesis into a single consistent probabilistic framework. Results: We present MotEvo, a integrated suite of Bayesian probabilistic methods for the prediction of TFBSs and inference of regulatory motifs from multiple alignments of phylogenetically related DNA sequences, which incorporates all features just mentioned. In addition, MotEvo incorporates a novel model for detecting unknown functional elements that are under evolutionary constraint, and a new robust model for treating gain and loss of TFBSs along a phylogeny. Rigorous benchmarking tests on ChIP-seq datasets show that MotEvo's novel features significantly improve the accuracy of TFBS prediction, motif inference and enhancer prediction. Availability: Source code, a user manual and files with several example applications are available at www.swissregulon.unibas.ch. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin

RERO DOC Digital Library

Interplay between stochasticity and negative feedback leads to pulsed dynamics and distinct gene activity patterns

Author: Agresti Alessandra
Bianchi Marco E.
Molina Nacho
Zambrano Samuel
Publication venue: 'American Physical Society (APS)'
Publication date: 14/08/2015
Field of study

Edinburgh Research Explorer

Revealing Assembly of a Pore-Forming Complex Using Single-Cell Kinetic Analysis and Modeling

Author: Bischofberger Mirko
Boss Daniel
Iacovache Ioan
Molina Nacho
Naef Felix
van der Goot F. Gisou
Publication venue: The Authors. Published by Elsevier Inc.
Publication date: 12/04/2016
Field of study

AbstractMany biological processes depend on the sequential assembly of protein complexes. However, studying the kinetics of such processes by direct methods is often not feasible. As an important class of such protein complexes, pore-forming toxins start their journey as soluble monomeric proteins, and oligomerize into transmembrane complexes to eventually form pores in the target cell membrane. Here, we monitored pore formation kinetics for the well-characterized bacterial pore-forming toxin aerolysin in single cells in real time to determine the lag times leading to the formation of the first functional pores per cell. Probabilistic modeling of these lag times revealed that one slow and seven equally fast rate-limiting reactions best explain the overall pore formation kinetics. The model predicted that monomer activation is the rate-limiting step for the entire pore formation process. We hypothesized that this could be through release of a propeptide and indeed found that peptide removal abolished these steps. This study illustrates how stochasticity in the kinetics of a complex process can be exploited to identify rate-limiting mechanisms underlying multistep biomolecular assembly pathways

Infoscience - École polytechnique fédérale de Lausanne

Elsevier - Publisher Connector

PubMed Central

Bern Open Repository and Information System (BORIS)

Structure of silent transcription intervals and noise characteristics of mammalian genes

Author: Benjamin Zoller
Damien Nicolas
Felix Naef
Green PJ
Nacho Molina
Zenklusen D
Publication venue: 'EMBO'
Publication date: 01/07/2015
Field of study

Mammalian transcription occurs stochastically in short bursts interspersed by silent intervals showing a refractory period. However, the underlying processes and consequences on fluctuations in gene products are poorly understood. Here, we use single allele time-lapse recordings in mouse cells to identify minimal models of promoter cycles, which inform on the number and durations of rate-limiting steps responsible for refractory periods. The structure of promoter cycles is gene specific and independent of genomic location. Typically, five rate-limiting steps underlie the silent periods of endogenous promoters, while minimal synthetic promoters exhibit only one. Strikingly, endogenous or synthetic promoters with TATA boxes show simplified two-state promoter cycles. Since transcriptional bursting constrains intrinsic noise depending on the number of promoter steps, this explains why TATA box genes display increased intrinsic noise genome-wide in mammals, as revealed by single-cell RNA-seq. These findings have implications for basic transcription biology and shed light on interpreting single-cell RNA-counting experiments

Infoscience - École polytechnique fédérale de Lausanne

Crossref

PubMed Central

Edinburgh Research Explorer

Quantifying ChIP-seq data:A spiking method providing an internal reference for sample-to-sample normalization

Chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) experiments are widely used to determine, within entire genomes, the occupancy sites of any protein of interest, including, for example, transcription factors, RNA polymerases, or histones with or without various modifications. In addition to allowing the determination of occupancy sites within one cell type and under one condition, this method allows, in principle, the establishment and comparison of occupancy maps in various cell types, tissues, and conditions. Such comparisons require, however, that samples be normalized. Widely used normalization methods that include a quantile normalization step perform well when factor occupancy varies at a subset of sites, but may miss uniform genome-wide increases or decreases in site occupancy. We describe a spike adjustment procedure (SAP) that, unlike commonly used normalization methods intervening at the analysis stage, entails an experimental step prior to immunoprecipitation. A constant, low amount from a single batch of chromatin of a foreign genome is added to the experimental chromatin. This "spike" chromatin then serves as an internal control to which the experimental signals can be adjusted. We show that the method improves similarity between replicates and reveals biological differences including global and largely uniform changes

Crossref

Serveur académique lausannois

PubMed Central

Edinburgh Research Explorer

ZORA

Base de datos de abejas ibéricas

Author: Aguado-Martín Luis Oscar
Alomar David
Arista Montserrat
Arroyo-Correa Blanca
Asís Josep D.
Azpiazu Celeste
Bartomeus Ignasi
Baños-Picón Laura
Beja Pedro
Boieiro Mário
Borges Paulo A. V.
Carvalheiro Luisa
Carvalho Rafael
Casimiro-Soriguer Ramón
Castro Silvia
Collado Miguel Ángel
Costa Joana
Cross Ian
De la Rúa Pilar
de Pablos Luis MIguel
de Paz Víctor
Díaz-Calafat Joan
Ferrero Victoria
Gaspar Hugo
Ghisbain Guillaume
González Bornay Guillermo
González-Estévez Miguel Ángel
Gómez José María
Gómez-Martínez Carmelo
Heleno Ruben
Herrera Jose M.
Hormaza Jose I.
Iriondo Jose M.
Kuhlmann Michael
Laiolo Paola
Lanuza Jose B.
Lara-Romero Carlos
Loureiro João
Lázaro Amparo
López-Angulo Jesús
López-Núñez Francisco A.
Magrach Ainhoa
Martínez-López Vicente
Martínez-Núñez Carlos
Michez Denis
Miñarro Marcos
Molina Francisco P.
Montero-Castaño Ana
Moreira Bruno
Morente-López Javier
Noval Fonseca Nacho
Núñez Carbajal Alejandro
Obeso José Ramón
Ornosa Concepción
Ortiz-Sánchez Francisco Javier
Pareja Bonilla Daniel
Patiny Sébastien
Penado Andreia
Picanço Ana
Ploquin Emilie F.
Rasmont Pierre
Rego Carla
Rey Pedro J.
Ribas-Marquès Elisa
Roberts Stuart P.M.
Rodriguez Marta
Rosas-Ramos Natalia
Santamaría Silvia
Sánchez Ana M.
Tobajas Estefanía
Tormos José
Torres Félix
Trillo Alejandro
Valverde Javier
Vilà Montserrat
Viñuela Elisa
Wood Thomas J.
Álvarez Fidalgo Piluca
Álvarez-Fidalgo Marián
Publication venue: 'Asociacion Española de Ecologia Terrestre (AEET)'
Publication date: 01/01/2022
Field of study

Las abejas son un grupo extremadamente diverso con más de 1000 especies descritas en la península ibérica. Además, son excelentes polinizadores y aportan numerosos servicios ecosistémicos fundamentales para la mayoría de ecosistemas terrestres. Debido a los diversos cambios ambientales inducidos por el ser humano, existen evidencias del declive de algunas de sus poblaciones para ciertas especies. Sin embargo, conocemos muy poco del estado de conservación de la mayoría de especies y de muchas de ellas ignoramos cuál es su distribución en la península ibérica. En este trabajo presentamos un esfuerzo colaborativo para crear una base de datos de ocurrencias de abejas que abarca la península ibérica e islas Baleares que permitirá resolver cuestiones como la distribución de las diferentes especies, preferencia de hábitat, fenología o tendencias históricas. En su versión actual, esta base de datos contiene un total de 87 684 registros de 923 especies recolectados entre 1830 y 2022, de los cuales un 87% presentan información georreferenciada. Para cada registro se incluye información relativa a la localidad de muestreo (89%), identificador y colector de la especie (64%), fecha de captura (54%) y planta donde se recolectó (20%). Creemos que esta base de datos es el punto de partida para conocer y conservar mejor la biodiversidad de abejas en la península ibérica e Islas Baleares. Se puede acceder a estos datos a través del siguiente enlace permanente: https://doi.org/10.5281/zenodo.6354502ABSTRACT: Bees are a diverse group with more than 1000 species known from the Iberian Peninsula. They have increasingly received special attention due to their important role as pollinators and providers of ecosystem services. In addition, various rapid human-induced environmental changes are leading to the decline of some of its populations. However, we know very little about the conservation status of most species and for many species, we hardly know their true distributions across the Iberian Peninsula. Here, we present a collaborative effort to collate and curate a database of Iberian bee occurrences to answer questions about their distribution, habitat preference, phenology, or historical trends. In total we have accumulated 87 684 records from the Iberian Peninsula and the Balearic Islands of 923 different species with 87% of georeferenced records collected between 1830 and 2022. In addition, each record has associated information such as the sampling location (89%), collector and person who identified the species (64%), date of the capture (54%) and plant species where the bees were captured (20%). We believe that this database is the starting point to better understand and conserve bee biodiversity in the Iberian Peninsula. It can be accessed at: https://doi.org/10.5281/zenodo.6354502Esta base de datos se ha realizado con la ayuda de los proyectos EUCLIPO (Fundação para a Ciência e a Tecnologia, LISBOA-01-0145-FEDER-028360/EUCLIPO) y SAFEGUARD (ref. 101003476 H2020 -SFS-2019-2).info:eu-repo/semantics/publishedVersio

Epsilon Open Archive

Repositório da Universidade dos Açores

Digital.CSIC

Adelante / Endavant

Author: Alberola Crespo Nieves
Alcalá García Inmaculada
Alcolea Constanza
Alfaro Cremades Irene
Alfaro Poce Idis
Alonso Moreno Daniel
Amez Fernández Irene
Andrés Zanutigh Federico
Aragó Maicas Eugenia
Arias Gómez Jorge
Arias Jimmy
Arroyo Albert Alba
Asensio Mulet Maite
Asensio Yedra Bruno
Baón María
Benítez Aguilar Pablo
Bermejo Manuel
Bermell Montesa Josefa
Bermeo Camila
Bethencourt Rodríguez Elena
Bocardo Gema
Brañanova Pato Ana Pilar
Butron Gotzone
Calvo Palomares María
Camacho García Katty
Carrillo Franco Carmen
Clavell Roglá Nieves
Cámara García Ana Isabel
Daniel Arganis Mayra
de Miguel Cerrada Javier
del Brezo Margarita
Delgado Wicke Nacho
Dellagiovanna Natalia
Dumi Ana
Díaz Arteaga Fátima
Díez Ignacio
D’Orazio Gustavo
Ebrí Arian
Emmerich Paula
Esquivel Ocaña Juana
Fernàndez Camahort Mar
Fernández García Hugo
Ferràndez López Anna
Francés Dueñas Jesús
Garcia Achondia Joan
Garcia Carbó Aina
García Castro Emilia
García del Pino Adriel
García Garrido María Soledad
García Eva Margarita
García-Zeballos Juan Herminio
Gaño Ordóñez Paula
González Arango María Aurelia
González Márquez José Gregorio
González Fernando
Gómez Vitoria María Jesús
Hara Amanda
Hernández Duro Luis
Herranz Giménez María Dolores
Hidalgo Lozano Carmen
Iriarte Lorea Nagore
Jamko Euge
Jarque Blasco Maximiliano
Juárez Tamargo Cristina
Lakukarda
Lamas Ramírez Samantha Ivana
León Sorribes Javier
Liébana María Marta
López García Sofía
López Susana María
Madera Gómez Silvia
Madrid Christa
Mandolesi Juliana
Marcazzo Florencia
Martínez Arce Marisa
Martínez Blanco Ana
Martínez de la Casa González Elena
Martínez Marco Lledó
Mejías Sullón Shirley Denisse
Mendoza Josefina Anahí
Merino Alday Yolanda
Minniti Norma
Molina Ibáñez María Luisa
Montes Trinidad Óscar
Morales Saro Cristina
Nasarre Romero Sofía
Navajas Ortega Andrés
Navarro Cazorla Aitana
Navarro Ruiz Inmaculada
Ortolá Crespo Silvia
Palatsí Pinyana Caterina
Paricio del Castillo Rocío
Peres Díaz Daniel
Perrone Agostina Dánae
Pina Erik
Pinyana Carme
Poetajc
Poveda Bort Mireia
Prades Vinaixa Alba
Pérez Lara María Yahvé
Quintela Barba Soraya
Rambla Crespo Laura
Riesco Bernier Sylvie
Riquelme Nova Nancy Emilse
Rocafull Baixauli María Isabel
Rodríguez Briz Fernanda
Rojas Hernández Gina Alexandra
Rojas Andrea Fabiana
Roque Astobiza Matías
Rosas Jiménez Olga Livia
Ruiz del Valle Celia
Ruiz Embarba Susana
Salamán Gina
Samayoa Recinos Jasmín
Sandoval Pazos Santiago
Santisteban Delgado Sandra María
Satizábal Carlos
Scopa Fabián
Selva Villanueva María
Senent Galmés María José
Serrano Galdón Cristian
Signes Urrea Carmen
Silvestre Castelló Noelia
Smati Zahrouni Roua
Soria Somolinos María Nieves
Soto Vargas Carlos Andrés
Sumalla Benito Aranzazu
Suárez Gonzalo
Sánchez Arguiano Luis
Sánchez Sánchez Mari Cruz
Tejado Meco Eva
Valero Pico María
Valero Uceda María de la Paz
Valero Valero Dori
Venturini Sabrina Lorena
Vila Tordera Natàlia
Vilches González Soledad
Vilchis Palacios Alejandro Iván
Vázquez Salomón Francisco
Zuluaga Yamid
Publication venue: 'Universitat Jaume I'
Publication date: 01/11/2019
Field of study

Séptimo desafío por la erradicación de la violencia contra las mujeres del Institut Universitari d’Estudis Feministes i de Gènere "Purificación Escribano" de la Universitat Jaume

Repositori Institucional de la Universitat Jaume I

Genome evolution and regulatory network structure in bacteria

Author: Molina Nacho
Publication venue
Publication date: 01/01/2010
Field of study

Funes, in spite of his infallible memory, was not capable of thought since, as J.L. Borges writes, ?to think is to forget differences, generalize, make abstractions.? Due to the latest technological advances, biology seems to be entering in a Funes-like state: biologists can amass more experimental data about the organisms they study than ever before; and, store these ?memories? in huge databases. A fundamental question rises: can the scientific community synthesize this information and turn it into powerful abstract theories? Is abstraction possible or even desirable in such a complex discipline as biology? From the point of view of a physicist I believe that a theoretical biology is both possible and desirable.Several quantitative laws have recently come to light in biology, particularly in the evolution and regulatory architecture of genomes. This thesis explores the implications on genome evolution and regulatory network structure of one such law: the scaling of functional content of genomes with their size. This was the starting point of this thesis which hopefully represents a tiny little step towards a general theory of genome evolution and regulatory network structure in bacteria. Genome evolution:Darwin's original work established the basis of the theory of evolution postulating that traits spread in populations by natural selection. This fundamental understanding was partially changed by the discovery that DNA carries heritable genetic information leading to the began of the new era of molecular evolution. ÿComparing orthologous mammalian DNA sequences to the fossil record indicated that the rate of amino acid substitutions was roughly constant in time. However, these substitutions fixed in populations too often to have been the result of selection. The high rate of fixation led Kimura to formulate his neutral theory of molecular evolution. Since then, neutral evolution became the null model of sequence evolution which permitted the rigorous reconstruction of phylogenies and detection of selection on gene sequences. Today the sequences available have grown from a few genetic loci to hundreds of whole annotated genomes . ÿThis wealth of data permits us to look beyond amino acid substitutions and study the variation in gene content and structure of genomes at a whole. ÿIn fact, several studies have shown that even closely related genomes with few substitutions often have enormous differences in gene content. These results highlight that changes at higher level of organization have an essential role in the evolutionary process and therefore in life diversity. The main forces causing these changes, i.e. shaping the gene-content of genomes, are gene duplication, gene deletion and horizontal gene transfer leading to the acquisition of genes with new functions, subfunctionalizing existing functions, or deleting genes whose functions are no longer required.Studies of gene content have uncovered several striking quantitative laws that are directly related to genome evolution. First of all, it was noticed that a number of key genomic quantities show power-law distributions. In particular, the distribution of gene families is a power-law in each genome, whose exponent appears to depend mostly on the size of the genome. Several theoretical models have been put forth for explaining these power-law distributions which all include gene duplications, gene deletions and gene innovation as key ingredients. Another striking observation is that the numbers of genes in different functional categories scale as power-laws in the total number of genes in the genome. For example, whereas the numbers of genes involved in different types of metabolism scale approximately linear with genome size, the number of genes involved with regulatory processes such as transcription regulation and signal transduction scales almost quadratically with genome size, and the number of genes involved with basic processes such as DNA replication or cell division scales with an exponent less than 1. Such scaling laws are observed for the large majority of high-level functional categories. As argued before, these scaling laws have important implications for the evolutionary dynamics of gene duplications and deletions. This thesis focuses on how the functional content of genomes scales with genome size. ÿWe show that these scaling laws hold across bacterial clades, and formulate the simplest null model which accounts for these scaling laws. ÿThe scaling exponents emerge as universal constants of genome evolution. ÿWe test the model's predictions against the protein domain content of closely related genomes by estimating the number of domain additions and deletions in each pair of genomes since they diverged from their last common ancestor. ÿThe available data support nearly all of the model's predictions. Finally, we discuss the implications of our work on the role of horizontal gene transfer in genome evolution. Regulatory Networks:We can view a bacterial cell as an entity made up of many molecular components that is capable of sensing many internal and external physico-chemical signals, and executing specific cellular programs in response. The realization of each program produces certain concentrations of specific proteins that act in some fashion beneficial to the cell. Thus, to understand the cell's dynamics, we must know how the protein concentrations change in response to the environment.Transcription of genes into mRNA molecules is one of the most important stages of protein biosynthesis. Transcription is regulated by specific proteins which are collectively called transcription factors. In response to stimuli, transcription factors bind specifically to DNA by recognizing short DNA sequences upstream of genes. Upon binding, they activate or repress transcription of genes into mRNA, i.e. transcription factor activate or repress gene expression. The set of all interactions between transcription factors and their regulated target genes form the so-called transcriptional regulatory network. Therefore, understanding this network is essential to understand the cell's response to its environment. The topological features of the transcriptional regulatory networks of E. coli and S. cerevisiae have been intensely studied and some of their global and local properties have been uncovered in recent years. For instance, some studies have shown that the distribution of the number of genes that are regulated by a particular transcription factor (or out-degree) follows a power law, while the number of transcription factors regulating a particular gene (or in-degree) follows an exponential distribution.Globally, these network are organized into subnetworks which show a hierarchical internal structure with very few feedback interactions except for self-regulation. Interestingly, it has experimentally been demonstrated that these subnetworks process specific environmental signals. Locally, certain motifs formed by few nodes appear more often than in random networks with the same degree distributions. The information-processing properties of these motifs has been studied individually as well as how they aggregate to form higher structures. However, it is not clear whether these motifs have been positively selected by evolution due to their particular functions, or they are a side effect of the evolution of the regulatory network. Some of these results are still controversial and it is important to recall that they were obtained on incomplete networks. They may not hold once the full networks are known. All the results above come from a small number of model organisms. Therefore, little is known about how the global structure of transcription regulatory networks varies across bacteria. Strikingly, the number of transcription factors grows roughly quadratically with the size of the genome. For example, according to the DBD database, the number of transcription factors per genome in bacteria varies from only 3 (of a total of 504 genes) in Buchnera aphidicola, to 801 (of a total of 7717 genes) in Burkholderia sp. 383. To put the latter number in perspective, the vastly bigger genomes of C. elegans and D. melanogaster have a lower estimated total number of transcription factors according to the same database. The enormous range in the number of transcription factors across bacteria reflects a corresponding range in complexity of gene regulation. For example, Buchnera lives in a very stable environment as an endosymbiont of aphids, and shows little transcriptional regulation. In contrast, Burkholderia can live under extremely diverse ecological conditions including soil, water, as a plant pathogen, and as a human pathogen, which most likely require complex regulatory mechanisms. This scaling property of the number of transcription factors has important implications for the structure of transcription regulatory networks. The total number of interactions between transcription factors and regulated genes is given by the number of transcription factors r times the average number of interactions per transcription factor, but also by the total number of genes times the average number of transcription factor that regulate a gene. Since the number of transcription factors per gene grows linearly with the total number of genes we cannot have that both the average number of interactions per transcription factor and the average number transcription factors that regulate a gene are the same in bacteria of different genome size. That is, either genes are regulated by more transcription factors in larger genomes or the regulon size decreases with genome size. Which of these scenarios is the one that occurs in nature? This thesis addresses this question.However, answering this question directly requires knowing a large number of transcriptional regulatory networks, but very few such networks are available. ÿInstead, we use an indirect procedure based on the assumption that regulatory sites on the genome evolved under purifying selected. ÿWe develop a novel method to measure purifying selection in intergenic regions. Our procedure starts from a set of related bacterial genomes (a clade) as provided by the NCBI microbial genome database, of which one is denoted as the reference species. For each gene and each intergenic region of the reference species we extract orthologous genes and intergenic regions from the other species and produce multiple alignments. We determine cliques of orthologous proteins (sets of genes that are all mutual orthologs between all species in the clade) and infer the topology of the phylogenetic tree from the concatenated alignment of all cliques. Then, we evaluate the amount of selection for each alignment column by the likelihood ratio of two evolutionary models: the background model that assumes a simple F81 substitution rate model which is parameterized by an overall mutation rate and a vector of equilibrium base frequencies. And, the foreground model that assumes the same substation rate model but with a unknown specific set of base frequencies that account for the selection action on that site that are integrate out of the likelihood. Some of these techniques were integrated into MotEvo, a novel tool for detecting binding sites in intergenic alignments given known weight matrices.We applied our method to 22 different bacterial clades which span widely the whole phylogenetic tree. We identified segments in the intergenic regions of the analyzed bacteria that show evidence of purifying selection. To evaluate the performance of our method for detecting real binding sites we studied the overlap between the identified segments and experimental verified binding sites of E. coli. The results show that we are available to detect real binding sites based on conservation. We obtained purifying selection profiles respect to gene start and stop sites revealing universal patterns across species. One of the most remarkable pattern is the selection that takes place around the start codon which is shown to be connected to translational efficiency. We observed, almost in all clades, a relatively higher frequency of adenine around the start codon which we showed is related to the avoidance of RNA secondary structure in that region. Coming back to our starting question: how the number of binding sites scales with genome size? To answer this, we studied the amount of purifying selection from intergenic regions across the 22 bacterial clades. Strikingly, the amount of purifying selection in intergenic regions does not vary with genome size. ÿMoreover, the most conserved DNA words in intergenic regions showed higher diversity in large genomes than in small ones. These results strongly indicate that the structure of transcription regulatory networks changes dramatically with genome size: small genomes have few transcription factors each binding to many sites, while large genomes have many transcription factors each binding to a few sites. In other words, gene regulatory complexity is limited across bacteria while transcription factors become specialized in large genomes

edoc